Characterizing protein crystal contacts and their role in protein crystallization 
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Crystallography may be the gold standard of protein structure determination, but obtaining the 
necessary high-quality crystals is akin to prospecting for the precious mineral. The fields of struc- 
tural biology and soft matter have independently sought out fundamental principles to rationalize 
the process, but the conceptual differences and the limited crosstalk between the two disciplines have 
prevented a comprehensive understanding of the phenomenon to emerge. Here we conduct a com- 
putational study of proteins from the rubredoxin family that bridges the two fields. Using atomistic 
simulations, we characterize the crystal contacts, and then parameterize patchy particle models. 
Comparing the phase diagrams of these models with experimental results enables us to critically 
examine the assumptions behind the two approaches and to reveal key features of protein-protein 
interactions that facilitate their crystallization. 
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The challenges of designing better drugs [T] |2] and bio- 
materials [3 rest in large part on determining the target 
proteins' precise structure. Incomplete structural infor- 
mation for a number of biologically important proteins, 
such as the ribosome i5j , also limits our understanding 
of their function. In spite of the recent advances in NMR 
techniques [6 , X-ray and neutron diffraction crystallog- 
raphy remain the methods of choice for high-precision 
protein structure determinations. Yet the lack of system- 
atic ways to crystallize proteins limits the applicability of 
crystallography [7-9 . An understanding of protein crys- 
tallization should follow from a detailed description of 
the protein-protein interactions J7\ [TQHT2] . But although 
the individual interaction types (e.g. hydrogen bonding, 
van der Waals, electrostatics, hydrophobic, etc.) are well 
characterized p!3HT7] , the effect of their collective behav- 
ior on protein assembly is still unclear p!8l . 

In the field of structural biology, protein-protein inter- 
actions have historically been divided in two categories: 
specific interactions responsible for protein complex for- 
mation or for protein-target association, and nonspecific 
interactions, such as those that underlie most proteins' 
crystallization [19]. Specific interactions are evolution- 
arily tuned to be free-energetically and geometrically se- 
lective. Nonspecific interactions are on average weaker, 
having possibly evolved as such to prevent pathological 
aggregation ^HHO]. Crystal contacts, which are one of 
the main sources of information on nonspecific interac- 
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tions, were also long considered stochastic, i.e., essen- 
tially indistinguishable from randomly selected surface 
patches [21-23 . Two recent studies suggest otherwise. 
Cieslik and Derewenda found that crystal contacts are en- 
riched for glycine and small hydrophobic residues, while 
large polar residues with high side-chain entropy are un- 
derrepresented [24]. Price et al. found that the proteins 
with a larger fraction of glycine and alanine on their 
surface are more likely to have been successfully crys- 
tallized [25 . These observations have provided statisti- 
cal support for the surface entropy reduction (SER) mu- 
tagenesis approach, which recommends replacing high- 
entropy surface residues with alanine to ease crystal for- 
mation [26l |27] . More importantly, they reveal that crys- 
tal contacts are not completely stochastic. 

In the field of soft matter, the observation that short- 
range isotropic attraction between particles results in a 
gas-liquid critical point that lies below the crystal solubil- 
ity regime |28l [29] suggested an analogy for the solution 
behavior of proteins [30^^ . In these systems, success- 
ful crystallization can most easily be achieved in the re- 
gion between the solubility line, above which the system 
does not aggregate because the disperse phase is stable, 
and the critical point, below which the system typically 
forms "amorphous" materials that are useless for crys- 
tallography [33 . The enhancement of homogeneous nu- 
cleation rates in the critical region [34H36] further gave a 
physical basis to George and Wilson's observation that a 
small second virial coefficient is a predictor of the nucle- 
ation zone of globular proteins [37]. Though appealing 
in their simplicity, isotropic interactions between proteins 
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were soon proven too simplistic to explain certain solu- 
bilization trends jSO l [38ti4T] . In response, a broad array 
of schematic models with anisotropic, directional attrac- 
tion, i.e., "patchy" models, were developed [4?, T3 . 

Despite the convergence between the structural biol- 
ogy and the soft matter research lines on the relevance of 
interaction anisotropy for protein crystallization, a large 
conceptual gap remains to be filled before reliable experi- 
mental guidance can be provided. In particular, although 
anisotropy plays a key role in crystallization [44ti46] . little 
characterization of the interaction between crystal con- 
tacts has been done [47 , leaving most of the physical 
assumptions behind patchy models untested. 

In this article, we use classical atomistic simulations 
to characterize the anisotropy in the crystal contact in- 
teractions of three closely related small globular pro- 
teins from the rubredoxin family: (a) the wild-type from 
Pyrococcus furious (wt-RbPf, PDB code: IBRF) [48;, 
its (b) W3Y/I23V/L32I mutant (mut-RbPf, PDB code: 
1IU5) [49] and (c) the W4L/R5S mutant from Pyrococ- 
cus abyssi (mut-RbPa, PDB code: 1YK4) [50]. These 
proteins from hyperthermophilic organisms, which are 
known for their extended temperature stability, provide a 
natural model system for our study. Through a compar- 
ative analysis, we identify the molecular basis of their in- 
teractions, and parameterize patchy models whose phase 
diagrams are then compared with experimental crystal- 
lization conditions. The validity of our approach is sup- 
ported by the recent success of multiscale descriptions 
of protein aggregation ^J. By showing that the mod- 
els and the experimental results agree fairly well, we (i) 
conclude that increasing the temperature may sometimes 
produce higher resolution crystals, (ii) suggest a micro- 
scopic interpretation for SER and, (iii) offer a framework 
for developing relevant patchy models for protein crys- 
tallization. 



I. RESULTS AND DISCUSSION 

Using umbrella sampling simulations (Methods, Ap- 
pendix B), we characterize the solvated protein-protein 
interaction as a function of the center of mass-center of 
mass (COM-COM) distance and of the protein's relative 
orientation for each crystal contact (1, 2, and 3) (Fig.jl]). 



A. Crystal contacts of wt-RbPf and mut-RbPf 

The orientationally-constrained potential of mean 
force (PMF) of wt-RbPf is first obtained in pure water so- 
lution (Fig.[T|\). All three contacts are strongly repulsive 
below 2.2 nm, which defines the protein diameter <j. The 
PMF minimum for interface la and 3a matches the ob- 
served crystal distance, and unsurprisingly the strength 
of the interaction correlates with the size of the interface 



(Fig. A.3). Otherwise, the interfaces share little in com- 
mon. Interface 2a is either repulsive or neutral, while 



interface la is attractive and relatively long-range (up to 
3 nm) and interface 3a is attractive and short-range (up 
to 2.45 nm). The relative abundance of hydrophobic re- 
gions in interfaces la and 3a compared to interface 2a 
(Fig. |A.4| [53^ 54^ suggests that this effect dominates the 
assembly. The range difference between the PMF of in- 
terfaces la and 3a can be explained by the stabilization 
of the former by a pair of hydrogen bonds between the 
C-terminus tail of one chain and a loop on the other. The 
tail's fiexibility extends the attraction to larger distances. 
Interestingly, the PMF of other reasonably attractive ori- 
entations result in interfaces that are essentially inacces- 
sible due to the presence of high intermediate entropic 
barriers (Appendix B, Fig. A.7). The crystal contacts 
are therefore clearly not purely stochastic nor isotropic. 

The N-terminal methionine and formylmethionine 
variants (PDB codes: 1BQ8, 1BQ9) [48] were crystal- 
lized under identical experimental conditions and show 
similar crystal structures, because the mutated region is 
not involved in any crystal contact. The core mutations 
of mut-RbPf also result in a similar crystal form. Be- 
cause of the low resolution of the C-terminus tail in this 
protein, the last two residues (Glu52 and Asp53) were 
not included in the PDB file (1IU5). Instead of assigning 
them a position, we choose to characterize the pair-wise 
interaction in the absence of the C-terminus in order to 
assess whether its presence is necessary for crystal for- 
mation. Interface lb (equivalent to la) is most affected 
by the tail's removal. Without it, the interface area is 
smaller and the interaction weaker by about 7 kJ/mol 
than in the wild- type (Fig.jl^), which indeed affects crys- 
tal formation, as we find below. 



B. Crystal contacts of mut-RbPa 

Mut-RbPa has substantially different crystal contacts 
from the previous two proteins. Although each chain has 
six nearest neighbors, only four of them (forming two in- 
terfaces) are sufficiently close to contribute to the pair 
attraction (Fig. [ip). The resulting distance at crystal 
contact for interface Ic is congruent with the minimum 
of the PMF (solid), but 2c's distance in the crystal is 
slightly smaller than the PMF minimum. This last dis- 
crepancy might be due to a slight compression of the 
sample upon its freezing for data collection, as we do 
not detect any protein distortions associated with the re- 
straints placed on the iron site sitting opposite to the 
interface (Fig. |A.5 ). In any case, the difference is rather 
small and the overall interaction free energy minimum is 
consistent with the crystallization conditions, as we find 
below. The hydrophobic component on interface Ic is 
relatively small. The attraction is dominated by a salt- 
bridge between Arg50 and Asp35, which is stable between 
1.9 and 2.1 nm, while at larger distances water fills the 
gap between the two proteins. The PMF therefore first 
plateaus, then rapidly increases (Fig. [T]C). 

Assuming that the strongest pair interaction typically 
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FIG. 1. PMF for wt-RbPf (A), mut-RbPf (B) and mut-RbPa (C). The black circles indicate the COM-COM distance in the 
crystal. The interface surfaces between neighboring proteins identify the pair-wise interactions of residues involved in crystal 
contacts |52]. 



forms first in a nucleation process, one may wonder why 
interface la in wt-RbPf is absent from the crystal con- 
tacts of mut-RbPa and vice- versa. In particular, interface 
Ic is at least twice as strong as any interface of wt-RbPf, 
so one may wonder why it is missing from the crystal con- 
tacts of wt-RbPf and its mutants. Interface Ic is charac- 
terized by the absence of the C-terminus and the muta- 
tion of Lys50 to arginine. Deleting the C-terminus allows 
the two chains to fit closely together and the arginine 
residue to form a salt bridge with Asp35 (Fig. A.9| A). 
To test whether the creation of this new interface can 
be attributed to the shorter tail of mut-RbPa, we delete 
the C-terminus of wt-RbPf and fit the two chains to the 



structure of interface Ic (Fig. |A.9p ). The PMF indicates 
that the interaction between the two chains is still neu- 
tral (Fig. |A.9p ), i.e., lysine and arginine are not here 
interchangeable. Closer examination reveals that both 
the nitrogen groups of arginine interact with the other 
chain through a salt bridge and a hydrogen bond. Con- 
versely, lysine offers a single nitrogen group to compete 
with solvation. The latter process is found to dominate, 
in agreement with lysine's solvation free energy being 
nearly twice more negative than that of arginine [55 . 



(G/L) identify the condition under which high and low 
concentration solutions of proteins can coexist. As ex- 
pected from the relatively short range of the attraction 
(~ 5 — 10%<j), in all cases the gas-liquid coexistence line is 
metastable with respect to crystallization. Observed rel- 
ative variations of the attraction width (between 1 and 
5%) and range (between 5 and 10%a) only change the 
melting temperature up to 3% and the critical tempera- 
ture up to 8%, which supports robustness of the model 
in the crystallization zone. 




C. Patchy particle models and phase diagrams 

In order to estimate the phase diagram of these pro- 
teins and to identify the facile crystallization regime, we 
use a coarse-grained model hybrid between Sear's [56] 
and Kern-Frenkel's [4¥. The proteins are represented as 
particles with a hard core of diameter a decorated by n=2 
or 3 pairs of surface attractive patches that each schema- 
tize a solvent mediated crystal contact (Fig.[2|\-B, Meth- 
ods). The parameters of the models are extracted from 
the MD simulations above (Table |l|, and the phase dia- 
grams are computed using advanced Monte Carlo meth- 
ods [57] (Fig. [3) Methods, Appendix C). The solubility 
lines (S/F) mark the coexistence condition between sol- 
vated and crystallized proteins, while the gas-liquid lines 



FIG. 2. Coarse-grained representation of the protein crystal 
(A) through a patchy particle model (B). The blue spheres 
are proteins on which each pair of patches corresponds to 
the crystal interface of the same color. C: schematic of the 
interaction between two particles through patch 2i and patch 
2i—l. The colored region represents the angular width of the 
patch. 



We find the temperature at which mut-RbPa was ex- 
perimentally crystallized (293 K) to be very near the 
model's critical temperature. This temperature has been 
suggested as optimal for crystallization [34], because the 
density fluctuations associated with criticality enhance 
the nucleation process. Crystals of mut-RbPf were in- 
terestingly detectable after a remarkably short period of 
12 hours 150 . Yet whether critical fluctuations facili- 
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tate crystallization in patchy models remains subject of 
debate [58 . It is nonetheless worth noting that the wild 
type of RbPa showed excessive nucleation in the same ex- 
perimental conditions and was only crystallized (at low 
resolution) by adding dioxane to the solution [50 . As the 
crystallization conditions for the mutant are very close to 
the critical point to begin with, a small perturbation of 
the protein-protein interaction range and width in the 
wild type may have been sufficient to tilt it over to the 
amorphous aggregation region. The patchy model phase 
diagram, therefore, suggests that increasing the crystal- 
lization temperature might have been a better strategy 
to obtain high-resolution crystals than adding a cosolute. 
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FIG. 3. Phase diagram for wt-RbPf, wt-RbPf at 3 M NaCl, 
mut-RbPf and mut-RbPa. The solid-fluid lines (S/F) for the 
three systems are represented respectively by asterisks, left- 
pointing triangles, points and circles. The gas-liquid (G/L) 
are indicated by diamonds, down-pointing triangles, stars and 
squares. The filled symbols identify the critical points. The 
dashed lines correspond to the temperatures at which the 
crystallization experiments were conducted. 



The identical experimental crystallization conditions 
for wt-RbPf and mut-RbPf, and for a series of mutants 
whose crystal contacts are closely related [48l ^2 ^ ^^S" 
gest that the phase diagram of the two proteins should 
be similar. Our removing of the C-terminus tail, which 
weakens lb, however, lowers the solubility line well below 
the experimental crystallization temperature. Although 
absent from the PDB structure, the C-terminus in the 
mutant should therefore interact with the other chain 
similarly as in the wild type for crystallization to be phys- 
ically possible. We cannot exclude that a mutant of RbPf 
missing the C-terminus could have the same crystal con- 
tacts as the wild type, but were it to be the case our 
study indicates that such a mutant would not crystallize 
under the same solution conditions. 



D. Effect of high salt concentration on wt-RbPf 

Even taking the contribution of the C-terminus into 
account the experimental crystallization temperature of 
RbPfs (277 K) is still too high for the obtained phase di- 
agram. A possible source of discrepancy is the difference 
in NaCl concentration used in the patchy model param- 
eterization compared to the experimental conditions (45 
mM vs. ~ 3 M). Salt only weakly perturbs the PMF of in- 
terfaces dominated b y hyd rophobic attraction (within the 
simulation error. Fig. A. 8), in agreement with NaCl being 
a weak salting-out agent [59], but interface 2a becomes 
significantly more attractive at 3 M NaCl (Fig. [5|. To 
better understand the origin of this effect, we examine the 
counter-ion distribution near the interface (see Appendix 
B). Interface 2 is characterized by the presence of six 
negatively (two on one chain and four on the other) and 
three positively (all on a single chain) charged residues. 
It is thus highly hydrated, as both the PDB structure 
(IBRF) and the simulations illustrate. The overall elec- 
trostatic repulsion between the two proteins at this in- 
terface is only weakly screened at low salt concentration, 
but quite screened at high salt concentration. Under the 
latter conditions, the positively charged residues at the 
edge of the interface are accessible to the ions and are 
therefore screened, but the negatively charged residues 
buried in the core of the interface are not. As a re- 
sult, like charge repulsions between residues on different 
proteins and positive-negative attractive interactions be- 
tween residues on the same protein are screened, but the 
unlike charges on different proteins at the core of the in- 
terface are essentially unscreened, generating an effective 
attraction between the two chains (Fig. [4|. More specif- 
ically, when no ions are present, the positively charged 
Lys6 interacts with Glu49 on the same chain and not 
with the polar carboxyl of Gly22 on the other chain. 
If the concentration of salt increases, Glu49 is entirely 
screened by counter ions beyond 2 A (black solid line), 
but both Lys6 and Gly22 still interact with each other. 

Reparameterizing the patchy model's interface 2a 
brings the phase diagram of wt-RbPf in remarkably close 
agreement with the experimentally observed crystalliza- 
tion conditions (Fig. [3|. 



II. THEORETICAL ASSESSMENT 

From this proof-of-concept study, we draw microscopic 
insights about both the underlying structural biology and 
soft matter motivations. 



A. Surface Entropy Reduction 

One of the main insights obtained from the statistical 
study of crystal contacts is that surface entropy reduc- 
tion should facilitate crystallization. The approach has 
been successfully used to crystallize proteins in various 
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FIG. 4. Number density of counter ions as a function of dis- 
tance from the charge of three different residues: Glu49 on 
chain A (soUd), Lys6 on chain A (dashed) and the oxygen of 
Gly22 on chain B (point-dashed). Black hnes refer to the 3M 
NaCl simulations, grey lines to the 0.5 M NaCl simulations. 
The inset shows the location of the residues at crystal con- 
tact. At low salt concentration Glu49 interacts with Lys6 on 
the same chain (arrow a), while at higher salt concentration 
Glu49 is screened and Lys6 interacts with the carboxyl group 
of Gly22 on the other chain (arrow b). 
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FIG. 5. PMF of interface 2 for wt-RbPf at different salt con- 
centrations. 



contexts \10^ and to rationalize the results of other stud- 
ies [60]. Yet it is not universal, and has but a weak mi- 
croscopic justification. The method is reasonable for in- 
teractions driven by hydrophobicity, which are disrupted 
by the presence of polar or charged residues [61], or 
for interactions characterized by backbone-backbone hy- 
drogen bonding, which are favored by the presence of 
small residues. For crystal contacts with polar side-chain 
interactions, however, the mutation/deletion of highly- 
entropic residues may have less straightforward effects. 

(i) Mutating the highly-entropic charged residues may 
weaken protein-protein interactions, because it deletes 
the source of favorable salt bridges between chains. In 
interface Ic of mut-RbPf the mutation of either Arg50 or 
Asp35 to an alanine would break the salt-bridge, desta- 



bilize the main attractive patch, and possibly prevent 
crystallization. The observation that in this interface ly- 
sine and arginine are not interchangeable, in spite of their 
similar conformational entropy and their net charge, also 
shows that entropic effects should not be considered in 
isolation. In this specific case, the different propensity of 
forming multiple bonds and the solvation free energy play 
key roles in distinguishing the effect of the two residues. 

(ii) For wt-RbPf, the SER method specifically recom- 
mends mutating Glu49 into alanine [27^. As observed 
in the analysis of interface 2a, the electrostatic interac- 
tion of Glu49 with Lys6 at low salt concentration pre- 
vents the inter-chain interaction between Lys6 and Gly22 
and therefore crystallization. Mutating Glu49 would 
delete the responsible charge and potentially strengthen 
the pair-interaction by an alternate route than increas- 
ing the salt concentration. In this situation, the sug- 
gested mutation would likely help crystal formation not 
because it specifically reduces the surface entropy, but 
because it changes the electrostatic potential at the in- 
terface (Fig. |A.10D . 

(iii) Entropy reduction more generally can have 
counter-productive effects on protein crystallization. A 
clear example is provided by the comparison of interface 
la and lb when the C-terminus of wt-RbPf is deleted. 
This region of the chain, being very flexible, increases 
entropy so one may postulate that its deletion should fa- 
cilitate crystallization. The phase diagram of truncated 
mut-RbPf shows instead a drop of the solubility line that 
prevents crystallization under the same conditions as for 
the wild type because of hydrogen-bond deletion. 

Although a useful starting point, it thus appears that 
the SER approach could be greatly improved if additional 
information were considered. In particular, the nature of 
the surrounding amino acids along the chain might help 
identify residues that are weakening interactions by dis- 
rupting hydrophobic patches or by competing with favor- 
able inter-chain interactions. 



B. Patchy models 

The central premise that the phase behavior of crys- 
tallizing proteins could be understood from that of parti- 
cles with short-range anisotropic interactions is here ver- 
ified, at least for compact globular proteins. The typical 
regime for successful experimental crystallization is in- 
termediate between the metastable critical point and the 
solubility line at low to intermediate protein concentra- 
tion, which results in open crystal structures dominated 
by directional interactions. 

It is well understood by now that reducing the range 
and surface coverage of attraction lowers the critical 
temperature and broadens the crystallization regime 
[43l HH [62] . Yet whatever role the critical point may have 
in assisting nucleation [34l [58] is of little relevance if the 
protein is highly soluble. Our study shows that in wa- 
ter, RbPf and its mutants, for instance, do not crystallize 
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because the solubility line lies at very low temperatures 
or high densities. The addition of salt to the solution 
strengthens the pair-wise interaction, but does not sig- 
nificantly affect its range or width, except for unscreened 
electrostatic interactions. The pair interactions thus nat- 
urally results in an attraction range that lies between 1 
and 3 A, and a larger protein would shrink the fraction 
of the protein diameter <j over which the attraction is 
felt. Although tightening the fractional range by an or- 
der of magnitude dramatically affects the critical point, 
it only weakly perturbs the position of the solubility line. 
If anything, crystallization should therefore become eas- 
ier. The parameter that most controls the solubility is 
the strength of the interaction. The phase diagram in 
Fig. |3] shows that to achieve reasonable solubility at room 
temperature, the average interaction strength per patch 
should be of order 8 kJ/mol for a protein with six crystal 
contacts and of 20 kJ/mol for a protein with four. 

In patchy particle models, modifying the strength of 
the interaction is equivalent to rescaling temperature: a 
stronger interaction maps the same phase diagram to 
higher temperatures. From a practical point of view, 
tuning the strength of the interaction, however, is quite 
difficult to achieve and typically requires a detailed mi- 
croscopic understanding. Our simulations show that even 
small details, such as mutating a lysine to an arginine, or 
the preferential screening observed at high salt concen- 
tration, can significantly affect the interaction, urging for 
more systematic atomistic-level studies in this direction. 

A common assumption in patchy particle models is 
that the patch-patch interactions responsible for crystal- 
lization are essentially identical. Our simulations clearly 
show it not to be the case in proteins. Comparing the 
phase diagram of wt-RbPf and mut-RbPa indicates that 
it may be more efficient to have more weak patches than 
fewer strong patches, in order to decrease the solubility 
of a protein. If we rescale the temperature over the aver- 
age energy per particle in the crystal (~10 ksT for wt- 
RbPf and ~18 ksT for mut-RbPa), the solubility line of 
mut-RbPa drops below that of wt-RbPf, indicating that 
mut-RbPf is crystallized by making up for the absence 
of an interface by having a drastically increased attrac- 
tion strength for the others. This observation raises the 
question of how the distribution of energy across patches 
affects the phase diagram, which has so far received little 
attention. 

In this paper, we have studied the interaction of three 
closely related proteins of the rubredoxin family using an 
approach that bridges structural biology and soft mat- 
ter descriptions of protein crystallization. It allowed us 
to characterize representative values of protein-protein 
interactions and to obtain reasonable phase diagrams, 
providing a microscopic explanation as to why certain 
simple mutations dramatically affect the crystallization 
behavior. The analysis even offers predictive capabili- 
ties. Overall, this approach is well-suited for simple pro- 
teins, especially those for which some structural infor- 
mation is available from analogs. Besides the increased 



computational cost, considering more complex proteins 
would require the introduction of additional features to 
the schematic models. For example, some proteins as- 
semble in more than one crystal form and consequently 
should be characterized by a larger set of patches. The 
position of the patches may also not be kept fixed if con- 
formational changes occur on a timescale similar to crys- 
tallization. Further work in this direction could be fruit- 
ful from both a theoretical and an experimental point of 
view. 



III. METHODS 

A. Molecular Dynamics simulations 

Molecular dynamics (MD) simulations are performed 
with the Gromacs package [63] using the Amber99sb 
forcefield [64] and explicit TIP4P water |63 with Ewald 
summation. Preliminary simulations revealed problems 
in the stability of the protein around the iron site 
(Fig. A.l), which are corrected with chemically justi- 
fied constraints (Appendix B). The corrected MD reca- 



pitulates experimental data (Fig. A.2). The PMF as a 
function of radial distance is obtained with the umbrella 
sampling and weighted histogram package implemented 
in Gromacs [66 . Separate simulations are run to resolve 
the angular dependence of the interaction and to analyze 
the ions behavior around the crystal contacts (Appendix 
B). 



B. Patchy particle model 

We use a patchy particle model, where each particle 
carries i = 1, . . . , n pairs of patches. Patch 2i interacts 
only with 2i — 1, as in the Sear's model [56 , while the 
range and width of the interactions are independent pa- 
rameters, as in the Kern-Frenkel model [44 . In contrast 
to both Sear's and Kern-Frenkel 's models, which assume 
the same interaction for each pair of patches, we allow 
the interaction to change from one pair of patches to the 
other, capturing the pair-wise nature of interaction at the 
crystal contact. 

The pair- wise interaction between particles 1 and 2, 
whose centers are a distance ri2 apart, is 

n 

^(ri2, ^1,^2) = (t>Hs{ri2) + ^[02i,2i-l(ri2, ^1, ^2) 



i=l 



+02i-l,2i(n2,^l,^2)], 



(1) 



where f^i and ^2 are the Euler angles. A hard-sphere 
(HS) potential captures the volume exclusion 
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where cr is the diameter of the particle. The patch-patch 
interaction is the product of a radial and an angular com- 
ponent 



wt-RbPf mut-RbPf mut-RbPa 



^2i,2i-l(n2,^l,^2) = ^2i,2i-l(n2)^2i,2i-l(^l,^2), 



where 



and 



'2i,2i-l 
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r > AiCT 



(3) 



(4) 



71,2? 



< 82i and 6>2,2i- 
otherwise 



< 5o 



'^2i,2i-l(^l, ^2) 

(5) 

The range of the interaction is in units of a between 
patch 2i and patch 2i — 1, 82% is the semi- width of patch 
2i and 0i^2i is the angle between the vector ri2 and the 
vector defining patch 2i on particle 1, as illustrated in 
Figj2p. An analogous definition holds for ^2,2i-i- 

The parameters of this model are obtained from the 
MD simulations (Table The depth of the well e cor- 
responds to the minimum of the PMF and the effective 
interaction range is such that the second virial co- 
efficient of the modeled interaction matches the actual 
protein-protein interaction at the crystal contact. The 
angular distribution of th e con figurational space sets the 
width of the patches (Fig. A.6), under the constraint that 
sin((52i) < {2\i)~^ to guarantee that each patch forms no 
more than one bond. 



C. Phase diagram 

The gas-liquid line of the phase diagram is obtained 
using the Gibbs ensemble method [67 and the critical 
temperature and density using the law of rectilinear di- 
ameters [68j. The solubility line is computed integrating 
the Clausius-Clapeyron equation starting from a first co- 
existence point, determined using free energy calculations 
and thermodynamic integration [571 168] (Appendix C). 



a{nm) 


2.2 


2.2 


2 


Ai (a) 


1.09 


1.05 


1.06 


ei {ksT) 


5.5 


2 


10.03 


cos(^i) 


0.97 


0.97 


0.93 


C0S(^2) 


0.94 


0.94 


.89 


A2 (a) 


1.1 


1.1 


1.04 




0.2 


0.2 


8.02 


COS (^3) 


0.9 


0.9 


0.95 


C0S(^4) 


0.89 


0.89 


0.92 


A3 (a) 


1.05 


1.05 




€3 (ksT) 


4.35 


4.35 




C0S(^5) 


0.89 


0.89 




C0S(^6) 


0.89 


0.89 





TABLE I. Model parameters for wt-RbPf and mut-RbPa 



crystal. The first neighbors of each protein are manually 
determined with direct visualization and the interface be- 
tween two neighboring proteins was built using Ciel [52j . 



Appendix B: Molecular dynamics (MD) simulations 

Due to the lack of a parameterization for the iron atom 
in the forcefield, we bind the iron to the rest of the pro- 
tein by treating the Fe-S bonds as covalent bonds in the 
topology file. The simulation box for each kind of MD is 
prepared as follows: water and ions are added to the pro- 
tein, followed by energy minimization (steepest descent 
integrator) and equilibration at constant volume for 100 
ps with position restraints on the heavy atoms (O, C, S 
and N). The MD simulations described in the following 
sections are then run with the parameters reported in ta- 
ble |A.1[ using the first ns as equilibration. Although the 
simulations are all run at 300 K, the PMF results hold in 
a surrounding temperature range, as long as the protein 
and the solvent do not undergo major conformational or 
physical changes. 
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Appendix A: Crystal reconstruction and interface 
determination 

The three proteins studied share the same low- 
symmetry space group P2i2i2i, and a single chain is 
present in the asymmetric unit for all investigated struc- 
tures. The package XPAND^69j is used to reconstruct the 



We first test the adequacy of the forcefield running 
unconstrained simulations of a single copy of each pro- 
tein in solution for 10 ns (individual unconstrained MD). 
Unsurprisingly, the forcefield fails in representing the dy- 
namics of the protein around the iron. All three protein 
structures undergo a large conformational change after 
about 4 ns in which the loop between residues 35 and 
45 opens due to a sh ift of the segment between residue 
41 and 48 (Fig. |A.1| and Fig. |A.2| \) and the breakage 
of several hydrogen bonds that should nominally be sta- 
ble [70]. This behavior is in disagreement with both the 
B-factor in the PDB file (PDB code: IBRF) [48 and the 
root mean square deviation (rmsd) in NMR studies (PDB 
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Parameter 


Value 


Forcefield 


Amber99sb 


Water model 


Tip4pEW 


Temperature 


300 K 


Temperature control 


Nose-Hoover thermostat 


Pressure 


1 bar 


Pressure control 


Parrinello-Rahman barostat 


Box dimension 


6 nmx6 nmxl2 nm 


Electrostatic method 


PME 


Coulomb radius 


1.4 nm 


Van der Waals method 


Cut-off 


Van der Waals radius 


1.4 nm 


Integration step 


2 f s 


Constraint algorithm 


Lines 


Salt concentration 


45 mM/3 M of NaCl 



unconstrained 



TABLE A.l. MD simulations parameters 



code: IZRP) [TT th at do not report strong fluctuations 
in this region (Fig. |A.2p ). It is unclear if the discrep- 
ancy is caused by an artifact due the artificial restraints 
that we use to bind the iron atom to the protein, by a 
poor parameterization of the cysteine by the forcefield 
when the metal is present, or by the failure of the force- 
field to capture the behavior of more subtle interactions 
around the site [70 . Changing forcefield (CHARMM) 
[72] and water model (TIP3P) does not improve the re- 
sults. To address this problem we introduce additional 
constraints on the NH- • -S hydrogen bonds length [70] 
near the iron using a harmonic potential with spring con- 
stant equal to 10000 kJ/(mol nm^). Because these con- 
straints do not affect the surface of the protein involved 
in crystal contacts, they should not bias our analysis, 
yet they successfully prev ent the conformational change 
of the protein (Fig. A.l and Fig. |A.2^ ) and recapitu- 
late the experimental data remarkably well (Fig. 



A.2 



and C). The discrepancy between the high B-factors and 
low fluctuations in the MDs and NMR for residues 26, 
30 and 31 (fig. |A.2p ) might be caused by some crystal 
abnormalities. Although no alternate conformations are 
present, residues 30 and 31 show a considerable amount 
of negative density, which can be explained by a partial 
decarboxylation due to radiation damage [73]. A better 
parameterized refinement of the model to account for this 
damage could improve the B-factor values. 



2. Umbrella sampling to determine the potential of 
mean force 



The umbrella sampling and weighted histogram pack- 
age implemented in Gromacs [66 are used to determine 
the PMF at each crystal contact of every protein. The 
starting configuration for umbrella sampling are gener- 
ated pulling one protein along the reaction coordinate 
while keeping the other one fixed. The reaction coordi- 




Loop 1 ■ 



constrained 
J 



.00 p 2 




FIG. A.l. Qjoi trace along a simulation of an individual protein 
with and without the constrained hydrogen bonds. The color 
changes from red to blue along the simulation. 



nate is identified by calculating the inertia tensor of the 
interface and its principal axes. Two of the axes define 
the plane of the interface, and the third, which is orthog- 
onal at the interface, is used as reaction coordinate. 

During the pulling, we control the reciprocal orienta- 
tion of the two proteins as in Ref. |47j . We first determine 
the four most stable heavy atoms in the structure from 
individual MD simulations, such that the tetrahedron de- 
fined by these four atoms spans most of the protein struc- 
ture, to be the of Ile7, Prol9, Asp35 and Glu52. For 
mut-RbPf, we use the Co. of Lys50 instead of Glu52, as 
the latter residue is not reported in the PDB file. We 
then restrain the angles between the edges of the two 
tetrahedra with a spring constant of 1000 kJ/ (mol rad^). 
These restraints prevent the proteins from rotating with 
respect to each other, yet allow ample freedom in both 
backbone and sidechain fluctuations. Fig. |A.2p and G 
compare the fluctuations and the chemical shifts for one 
of these simulations (first window of interface lb), for 
the isolated MDs and for the NMR. Although the fluctu- 
ations in the umbrella sampling MD (US MD) are lower 
than those in the isolated MD, probably because of the 
presence of another chain close by, they do not show any 
significant distortion and remain in agreement with the 
NMR data. The starting configurations are sampled ev- 
ery 1 A up to a distance of 15 A with a spring constant of 
1000 kJ/(mol nm^). The simulations are run for a total 
of 10 ns. These parameters typically provide sufficient 
overlap between the distributions from each window to 
perform histogram reweighting. In cases where the PMF 
gradient is too high, the window size is reduced to 0.5 A, 
so as to preserve a reasonable overlap between neighbor- 
ing windows. 

We estimate the accuracy of the umbrella sampling 
using 100 bootstraps. The resulting standard deviation 
associated with the minimum of the PMF is between 2 
and 3 kJ/mol, which corresponds to a 10-20% relative 
error for the strength of the interaction. Replicates of 
the simulations (three) provide a similar estimate. This 
uncertainty does not include further errors introduced 
by the limitations of the forcefield, but is in line with the 
overall robustness of the parameters used for the phase 
diagram determination. 
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FIG. A. 2. Comparison between MD results and experimental data. A: Total rmsd of the Cq^s along the individual unconstrained 
simulation (black) and the individual constrained simulation (red). B: Ca fluctuations in the unconstrained isolated MD 
simulations (MD uncostr), the isolated MD simulations with constrained hydrogen bonds (MD constr), the umbrella sampling 
MD simulations (US MD), the NMR fluctuations (NMR, PDB code: IZRP) [TT and the B-factor reported in the crystal 
structure (X-ray, PDB code: IBRF) 48 . The Ca involved in loop 1, 2 and 3 (Fig. |A.l| are highlighted respectively in red, 
green and blue. C: Chemical shifts and relative standard deviation obtained with SPARTA [74J using 800 frames from the 
equilibrated constrained isolated MDs and the equilibrated US MD compared with the experimental chemical shifts of the 
NMR structure [7T] . 




4 6 8 

Energy minimum (l<gT) 



FIG. A. 3. Scatter plot of the minimum of the pmf in units 
oi ksT and the solvation area of the interface for the crystal 
contacts in wt-RbPf and in mut-RbPa. 

Hydriphobicity 



Low 



High 




a25 

0.2 



NMR 

- MD constr 
US MD A 
US MDB 




FIG. A. 5. Comparison between the rmsd from NMR of wt- 
RbPf (NMR) [n], from the constrained individual MDs on 
mut-RbPa (MD constr) and from chain A (US MD A) and 
chain B (US MD B) in the first umbrella sampling window of 
interface 2c. The iron site of chain A is involved in the crystal 
contact. The good agreement between NMR and US MD for 
both chains in loop 1 and 2 (red and green highlighted regions) 
suggests that the constraints on the iron do not introduce any 
detectable distortion and are, consequently, unlikely to be the 
source of disagreement between the position of the minimum 
in interface 2c and the distance at crystal contact. 



FIG. A. 4. Projection of the thee crystal contacts of wt- 
RbPf labeled as in Fig. 1. Each tile represents a residue 
involved in the interface colored depending on its hydropho- 
bicity level [54] , 



3. Angular dependence of the interaction 

In order to investigate the angular component of the in- 
teraction, we run four independent 5 ns-long simulations 
restraining the distance between the centers of mass of 
the two proteins for each interface with a spring constant 



of 1000 kJ/(mol nm^). Assuming that around the crys- 
tal contact different configurations are visited according 
to the Boltzmann distribution, the sampled ensemble of 
angles provides an estimate of the pair-wise interaction 
width. Given the model notation (see Methods), we de- 
fine 82% = min[asin(2^), acos(l — 2cr2i)], where a2i is the 
standard deviation of the cos^^) distribution obtained 
with these simulations (Fig. A.6). 
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FIG. A. 6. Sample determination of the angular width of 
the interaction. The histogram represents the distribution 
of cos(^i) for interface la. The red line identifies the position 
of 2 standard deviations, which is used to parameterize the 
model. 




ns-long MD simulations. We constrain the center of mass 
of the proteins and their reciprocal orientation to that of 
the crystal. Frames selected every 20 ps are used to draw 
statistics over the ions located around the interface. The 
number density of the ions is obtained normalizing over 
the available volume defined as the total volume minus 
the volume occupied by two spheres of diameter <j and 
centered at the proteins' center of mass. 
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FIG. A.8. PMF as a function of COM-COM distance for 
wt-RbPf in a 3 M solution of NaCl. 



FIG. A. 7. PMF of two alternate protein-protein interfaces. 
Random interface 1 is built using one chain oriented as in in- 
terface la and one chain oriented as in interface 2a (solid blue 
line). Random interface 2 is the lowest energy configuration 
found by RosettaDock (dashed red line). Both show a free 
energy barrier of about 6 kJ/mol. 



4. Alternate orientations and their interaction 

To assess the specificity of the crystal contacts, we 
determine the PMF at two alternate orientations. The 
first is a hybrid between interface la and 2a, which ad- 
dresses whether one patch interacts only with its partner 
patch. For the second interface, we use the highest scored 
configuration found by RosettaDock [75l [76]. Fig. A. 7 
shows that the first interaction is mostly repulsive and 
weakly attractive at short range, in support of our patch- 
specific interactions. The second interaction, although at 
short range sightly more attractive than interface la, is 
mostly inaccessible because of the high free-energy bar- 
rier caused by the entropic cost of immobilizing the two 
C-termini. 



Characterization of ion behavior at high salt 
concentration 



To understand the behavior of ions around interfaces 
at high salt concentration (3 M of NaCl), we perform 4 



> Asp 35 



Lys50»Asn"21 \ 




2.5 3 
COM-COM distance (nm) 



FIG. A. 9. A: zoom on the interaction between Arg50 and 
the neighboring chain in interface 1 in mut-RbPa. The solid 
line identifies the salt-bridge with Asp35 and the dashed line 
the hydrogen bond with the carbonyl group of Asn21. B: 
interaction of Lys50 with the neighboring chain after deleting 
the last residue of wt-RbPf and fitting the structure in order 
to overlap the crystal contact of mut-RbPa. The Lysine can 
either form the hydrogen bond or the salt-bridge, but not 
both. Bond lengths (in A) are reported above the bond lines. 
C: PMF for the interface represented in panel B, showing a 
neutral interaction between the two chains. 
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Appendix C: Determination of the phase diagram 

The phase diagram is determined using advanced 
Monte Carlo (MC) techniques. Gibbs Ensemble MC sim- 
ulations directly determine the coexistence densities of 
the metastable gas and liquid phases [67] . We simulate a 
total of N =1000 particles for 10^ MC cycles where each 
cycle consists on average of N particle displacements, TV 
particle rotations, N/10 particle swaps, and 2 volume 
V changes. The critical temperature Tc and density are 
then estimated using the law of rectilinear diameters [68 . 

Because the gas-liquid line is metastable, we observe 
that for certain models crystallization happens so quickly 
that we are unable to determine the gas and liquid den- 
sities. In such cases (i.e. wt-RbPf at 3M of NaCl), we 
estimate the critical temperature as the lowest temper- 
ature at which phase separation is not observed within 
10^ MC cycles, which gives an error of about 3 K on the 
position of the critical point. 

To determine the fluid-solid coexistence curve, we inte- 
grate the Clausius-Clapeyron equation starting from one 
coexistence point using a fourth order predictor-corrector 
algorithm [29]. The temperature step size is A/3 = 0.05 
for wt-RbPf and mut-RbPf and Af3 = 0.02 for mut- 
RbPa. 

The coexistence point is determined using free energy 
calculations. The free energy of the fluid is computed 
using thermodynamic integration from the free energy 
of an ideal gas [57]. The free energy of the crystal is 
determined using for reference the Einstein crystal with 
fixed center of mass [77] with Hamiltonian 



the vector defining patch j of particle i and the corre- 
sponding vector in the Einstein crystal. As explained in 
Ref. [57] , the free energy of the reference Einstein crystal 
can then be written as 



where 



M^trans 



and 



COM 



In 



^COM 
^Ein 



37V- 1 
2 N 



^COM 



^COM 



In 



2N 



(C2) 



In TV (C3) 



dO sm{0)d(l)dx exp[-f3^orf{0, (j), x)] 

(C4) 

The calculation of a^^i^f straightforward, but that 
of a^^^ requires either a tedious numerical integration, 
as in Ref. [78l, or an analytical approximation. We opt 
for the latter using a saddle point approximation, which 
is very efficient for high values of /3^or such as those used 
here, because the integrand is sharply peaked. Defining 
(6^0 7 </>0 7 Xo) as the reference orientation in the Einstein 
crystal and changing variable a = (cos(6>), 0, x) gives 



(Korf/^ det{H[f{ao)]y/^ (/3Cor)3/2 det(F[/(ao)])i/2 



such that 



N 



N 



^^'"(^trans, ^or) = ^trans ^(r^ -ri,o)^+^or ^ f{Oi, ^i, Xi), 
i=l i=l 

(CI) 

where f{Oi,(j)i,Xi) = 1 - cos(V^i,i) + 1 - cos(V^i,2) , 
{Oi^(j)i,Xi) ai"6 the Euler angles describing the orienta- 
tion of particle i and tpij is the angle formed between 



COM 



■HPU) + -ln(8^det(i7[/(ao)])), (C5) 



where det(i^[/(ao)]) is the determinant of the Hessian 
of function / computed at ao- The analytical expression 
of /(a) is reported below defining y = cos(^) and ( as 
the angle between the vectors identifying patch 1 and 2, 
which does not depend on the orientation of the particle. 



f{oc) = \{^- 3\/l - ^2 cos(0) cos((/)o) sin(6>o) - \/l - cos(x) cos(xo) sin(l9o) - 2y^l-y'^ cos((/)) cos(C) sin((/)o) sin(xo) si 

- 2?/cos(xo) cos(C) sin(l9o) sm{()2y cos{(j)) cos(x) sin((/)o) sin(xo) sin^(C) - 2ycos{(j)) cos((/)o) sin(x) sin(xo) sin^(C) 

— \/r^^^cos(x) cos(xo) sin(^o) sin^(C) + ycos{(j)) cos((/)o) cos(x) sin(^o) sin(2C) + cos{(j)) sin((/)o) sin(x) sin(^o) sin(2C) 
- cos(^o) [—^y + cos(0o) cos(xo) sin(0) sin(x) + cos^(C) {—y + vcosiy) cosfvn) sinfc^) sinfc^n) 



^ cost^f7o; [-oy ^ cost^^o; <^ust^xo; snu^^; smt^x; ^ \-V ^ ^cos(x) cos(xo) sin(0) sin(0o) 

- cos(^o) cos(xo) sin(0) sin(x)) - 2vT^^^cos(x) cos(C) sin(C) + ^sin^(C) + cos(0o) cos(xo) sin(0) sin(x) sin^(C) 

- 2cos(0) cos(xo) sin(^o) sin(x) sin^(C) - ^cos(x) cos(xo) (2cos(0) cos(^o) sin(C) + sin(^) sin(0o)(l + sin^(C))) 

+ ^\-y^ cos(^) cos(^o) cos(xo) sin(2C) +\/l - y^ cos(xo) sin(^) sin(0o) sin(2C) 

[- sin((/)o) (2 sin(x) sin(xo) sin^ (C) + cos((/)o) (-2 cos(C) sin(x) sin(6>o) sin(C) 
l(Xo) (-2^cos(x) sin2(C) + ^1 - sin(2C)))] } . 
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FIG. A. 10. Comparison between the effect of salt and the 
effect of the surface entropy reduction (SER) method on in- 
terface lb. Following Ref. [21], we characterize the SER PMF 
on the E30A/E31A/E49A mutant of RbPf at 45 mM of NaCI, 
confirming that the interaction is attractive. Nevertheless, 
the interaction is weaker at high salt concentration, suggest- 
ing that the main contribution comes from the removal of the 
charged Glu49, not from entropy reduction (see Discussion). 



Once the free energy of the reference crystal is known, 
the free energy of the actual crystal is calculated following 
the standard protocol [78]. Several simulations along an 
isobar starting from the fluid and from the crystal are 
then necessary to determine the temperature at which 
the chemical potential of the two phases coincides [ST] 
[75] . The details of the coexistence points are reported in 
Table 1X1 



Protein model 


T 


P 


Pfluid 


Pcrystal 


wt-RbPf 


0.71 


0.35 


0.45 


0.85 


mut-RbPf 


0.35 


0.35 


0.59 


0.88 


mut-RbPa 


1.18 


0.35 


0.48 


0.71 



TABLE A. 2. Temperature-pressure reference coexistence 
point between the fluid and the crystal for wt-RbPf, mut- 
bPf and mut-RbPa, where T and P stand for temperature (in 
300 K) and pressure (in ksT/a^) and pfluid and pcrystai stand 
for the fluid and crystal density (in cr~^), respectively. 
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